feat(evolve): Phase 2 of /evolve --mode=loop — ladder + cron self-adjust + typed blocked events (soc-g2qd #phase-2) by boshu2 · Pull Request #397 · boshu2/agentops

boshu2 · 2026-05-21T17:47:48Z

Why

Phase 2 of soc-g2qd: ship the CLI enforcement primitives the skill's prompt-text alone can't guarantee.

Bead	What it gives the operator-loop
soc-mlbm	`ao evolve next-work` — 5-step programmatic ladder; agent stops guessing what to claim next
soc-un0m	`ao cron self-adjust` — renders cron template + emits JSON spec; replaces manual CronList/Delete/Create per cycle
soc-g34d	`ao evolve blocked` — typed blocked events at `.agents/evolve/blocked.jsonl`; agent logs rather than halts

What changed

Surface	Change
`cli/cmd/ao/evolve_next_work.go` + `_test.go`	New subcommand + L2 integration tests
`cli/internal/evolve/ladder/`	5-step ladder package (shape_filter, grep_siblings, primitive_test, cross_hop_pickup, bug_fallback) with table-driven unit tests
`cli/cmd/ao/cron.go`	New top-level `ao cron` command
`cli/cmd/ao/cron_self_adjust.go` + `_test.go`	New subcommand; calls `evolve.VerifyMarkers` + `evolve.Render` from #394; writes audit row to `.agents/evolve/cron-history.jsonl`; emits JSON spec to stdout (harness orchestrates CronCreate)
`cli/cmd/ao/evolve_blocked.go` + `_test.go`	New subcommand: `--reason` (write), `--list [--tail N] [--json]` (read), `--clear <cycle>` (operator)
Generated: `cli/docs/COMMANDS.md`, `registry.json`, `docs/cli-skills-map.md`	Regen for 3 new subcommands
`evals/agentops-core/cli-command-surface-matrix.json` + smoke fixture	Counts bumped 73/199/272 → 74/202/276

How tested

L2 integration: each new subcommand has L2 tests using fixture workspaces
L1 unit: 5-step ladder per-step table tests + JSONL schema validation on blocked records
Mechanical: go test ./cli/... 0 → 0 failures; cli-command-surface-smoke.sh cli-help-matrix-ok; check-no-tracked-agents.sh exits 0

Counts

CLI heading counts: top 73 → 74, sub 199 → 202, all 272 → 276.

Sibling pattern: cron-history.jsonl + blocked.jsonl follow the cycle-history.jsonl JSONL append-only shape from soc-5qit. Ladder structure mirrors the in-prompt cascade in references/scout-mode.md — making it programmatic per §A5.

[no-sibling for cron-self-adjust] First-of-kind: no prior subcommand emits a cron-spec JSON for harness orchestration. The CLI does the safe work (template render + marker verify + audit row); the harness owns CronCreate.

See: docs/plans/2026-05-21-evolve-loop-epic-design.md §A4, §A5, §A6

Closes-scenario: soc-mlbm#next-work-ladder
Closes-scenario: soc-un0m#cron-self-adjust
Closes-scenario: soc-g34d#typed-blocked-events
Bounded-context: BC5-Runtime
Evidence: cli/cmd/ao/evolve_next_work.go

…ust + typed blocked events (soc-g2qd #phase-2) Closes 3 Phase-2 sub-beads of soc-g2qd in one PR (same surface cli/cmd/ao/, inter-dependent): | Bead | Surface | |---|---| | soc-mlbm | `ao evolve next-work` — 5-step programmatic ladder (cli/internal/evolve/ladder) | | soc-un0m | `ao cron self-adjust` — render cron template via evolve.Render; emit JSON spec to stdout (harness orchestrates CronCreate) | | soc-g34d | `ao evolve blocked` — typed blocked-event log at .agents/evolve/blocked.jsonl (--reason write / --list read / --clear) | ## What changed | Surface | Change | |---|---| | `cli/cmd/ao/evolve_next_work.go` + `_test.go` | New subcommand + L2 integration tests | | `cli/internal/evolve/ladder/ladder.go` + `_test.go` | 5-step ladder package (shape_filter, grep_siblings, primitive_test, cross_hop_pickup, bug_fallback) with table-driven unit tests | | `cli/cmd/ao/cron.go` | New top-level `ao cron` command (parent for self-adjust) | | `cli/cmd/ao/cron_self_adjust.go` + `_test.go` | New subcommand; calls evolve.VerifyMarkers + evolve.Render from #394; writes audit row to .agents/evolve/cron-history.jsonl; emits JSON spec to stdout | | `cli/cmd/ao/evolve_blocked.go` + `_test.go` | New subcommand: --reason (write), --list [--tail N] [--json] (read), --clear <cycle> (operator-only) | | Generated: COMMANDS.md, registry.json, cli-skills-map.md | Regen for 3 new subcommands | | evals/agentops-core canary counts | Bumped: top 73→74, sub 199→202, all 272→276 | ## How tested - L2 integration: each subcommand has L2 tests using fixture workspaces and asserting structural equality on outputs - L1 unit: ladder per-step table tests (5 steps × multiple cases each); JSONL schema validation on blocked-event records - Mechanical: `go test ./cli/...` green; cli-command-surface-smoke.sh green; check-no-tracked-agents.sh green; TestCobraConformance green Sibling pattern: cron-history.jsonl + blocked.jsonl follow the cycle-history.jsonl JSONL append-only shape from soc-5qit. Ladder is novel but its step structure mirrors the in-prompt cascade in `references/scout-mode.md` — making it programmatic per §A5. Fitness: tests roughly +33 → ~33/33 new tests passing (5 ladder steps × ~3 cases each + 3 L2 subcommand + 3 L1 schema). go test ./cli/cmd/ao + ./cli/internal/evolve/ladder green. [no-sibling for cron-self-adjust] First-of-kind: no prior subcommand emits a cron-spec JSON for harness orchestration. The pattern is intentionally minimal — the CLI does the safe work (template render + marker verify + audit row); the harness owns CronCreate. See: docs/plans/2026-05-21-evolve-loop-epic-design.md §A4, §A5, §A6 Closes-scenario: soc-mlbm#next-work-ladder Closes-scenario: soc-un0m#cron-self-adjust Closes-scenario: soc-g34d#typed-blocked-events Bounded-context: BC5-Runtime Evidence: cli/cmd/ao/evolve_next_work.go

…tors (soc-2gd6 #eval-hard-fails) (#402) ## Why The v2.42.0 release gate (`scripts/ci-local-release.sh`) was red on 8 evals. The 3 score-0/near-0 hard fails are all **eval-staleness behind legitimate recent refactors** — verified, not gaming or security weakening. Operator decision: update eval to match source of truth (executable > contract). | Eval | Was | Cause | Fix | |---|---|---|---| | `hook-manifest-command-counts` | 0 | `session-pr-counter.sh` (PR #362) is the legit 37th hook script; eval hardcoded 43/36 | bump expected counts 43→44, 36→37 | | `push-worktree landing-plane` | 0.14 | #387 tiered-AGENTS split moved "Landing the Plane" to `AGENTS-WORKFLOW.md` (+ dropped 2 lines) | redirect eval target `AGENTS.md`→`AGENTS-WORKFLOW.md` + restore the 2 dropped policy lines | | `security-toolchain ci-soft-gate-policy` | 0 | gate is intentionally **HARD** (no `continue-on-error`); job already runs `security-gate.sh --mode quick` + uploads artifacts | drop the stale `continue-on-error` requirement (security stays HARD) | **Security note:** `security-toolchain-gate` stays a HARD blocking gate. Only the stale "soft gate" assertion was removed from the eval; the actual scan + artifact upload + summary-blocking are unchanged. ## How tested - hook-manifest jq → `hook-manifest-counts-ok` - security smoke `ci-policy` → `security-toolchain-ci-policy-ok` - all 7 landing-plane strings present in `AGENTS-WORKFLOW.md` - shellcheck clean on edited smoke ## Scope honesty This fixes the 3 **hard** fails only. The release gate still has **5 minor evals (0.71–0.99)** + the **vil/release-smoke** lane — a separate remediation, deliberately NOT in this PR (no green-washing). Sibling pattern: same "update eval to match legitimately-changed source of truth" move as the cli-command-surface canary bumps in #396/#397. Fitness: release-gate eval hard-fails 3 → 0. Closes-scenario: soc-2gd6#eval-hard-fails Bounded-context: BC4-Validation Evidence: evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh

github-actions Bot added docs cli labels May 21, 2026

boshu2 force-pushed the feat/evolve-loop-phase2-soc-g2qd branch from 7613842 to 1c501ec Compare May 21, 2026 18:00

boshu2 merged commit 0759a74 into main May 21, 2026
71 checks passed

boshu2 deleted the feat/evolve-loop-phase2-soc-g2qd branch May 21, 2026 18:09

boshu2 mentioned this pull request May 22, 2026

fix(evals): unstale 3 release-gate eval hard-fails behind legit refactors (soc-2gd6 #eval-hard-fails) #402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evolve): Phase 2 of /evolve --mode=loop — ladder + cron self-adjust + typed blocked events (soc-g2qd #phase-2)#397

feat(evolve): Phase 2 of /evolve --mode=loop — ladder + cron self-adjust + typed blocked events (soc-g2qd #phase-2)#397
boshu2 merged 1 commit into
mainfrom
feat/evolve-loop-phase2-soc-g2qd

boshu2 commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

boshu2 commented May 21, 2026

Why

What changed

How tested

Counts

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant